Mall Customer Segmentation

Motivation

Customer segmentation, also known as market basket analysis, is a very useful concept in marketing. By identifying unique customer traits, businesses understand their customers on a deeper level, allowing more strategic marketing and advertising to target different groups of customers.

Using K-means clustering, which is an unsupervised machine learning technique, we can group similar customers and identify several types of customer profile.

This dataset consists of hypothetical customer data in a shopping mall.

Data source: https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python

Sections:

  1. Exploratory data analysis
  2. Data preparation
  3. Training model
  4. Model evaluation

Data Information

This dataset has five features:

1. CustomerID: Unique ID assigned to the customer
2. Gender: Gender of the customer
3. Age: Age of the customer
4. Annual Income (k$): Annual income of the customer, in thousand dollars
5. Spending Score (1-100): Score assigned by the mall based on customer behavior and spending nature, ranging from 0 to 100

In total, there are 200 customer records in this dataset.

Import Libraries & Data

Exploratory Data Analysis

There is no missing data.

Distribution

By plotting histogram, the distribution of the features can be observed.

Gender

There are more female customers.

Relationship between features

Plotting a heatmap of the correlation between the features shows that there are very weak associations. However, from the pairplot, we observe that there are stronger associations between the features when separating them by gender.

For example, there is moderate negative correlation between age and spending score.

We also observe that the female customers are older, have higher annual income and spending score.

Age and annual income

Customers between age 30 - 60 have higher annual income in general.

Age and spending score

We can see that young people have high spending score, while customers who are 30 years old or above have low or moderate spending score.

Annual income and spending score

We can observe 5 groups of customers here:

  1. Customers with low annual income and low spending score, these customers are likely to spend within their means.
  2. Customers with low annual income and high spending score, these customers enjoy spending on goods and enjoyment.
  3. Customers with moderate annual income and moderate spending score, promotion might be useful in getting them to spend more.
  4. Customers with high annual income and low spending score, this group of customers are also very valuable as they have the spending power to pay for products.
  5. Customers with high annual income and high spending score, it might be useful to have a loyalty programme to ensure these customers are retained in the long term.

Determining number of clusters (k value)

We will group customers using 3 features: Annual income, spending score and age.

To determine the appropriate number of clusters, we will be using the elbow method.

Based on the graph, WCSS decreases sharply and the elbow shape is created at k = 5. The optimal number of clusters for the model is 5.

Train model

Conclusion

1. Blue cluster

2. Orange cluster

3. Yellow cluster

4. Purple cluster

5. Pink cluster

References: